An Analysis of Direct Reinforcement Learning in Non-Markovian Domains
نویسندگان
چکیده
It is well known that for Markov decision processes, the policies stable under policy iteration and the standard reinforcement learning methods are exactly the optimal policies. In this paper, we investigate the conditions for policy stability in the more general situation when the Markov property cannot be assumed. We show that for a general class of non-Markov decision processes, if actual return (Monte Carlo) credit assignment is used with undiscounted returns, we are still guaranteed the optimal observation-based policies will be equilibrium points in the policy space when using the standard “direct” reinforcement learning approaches. However, if either discounted rewards, or a temporal differences style of credit assignment method is used, this is not the case.
منابع مشابه
Dissertation an Echo State Model of Non-markovian Reinforcement Learning
OF DISSERTATION AN ECHO STATE MODEL OF NON-MARKOVIAN REINFORCEMENT LEARNING There exists a growing need for intelligent, autonomous control strategies that operate in real-world domains. Theoretically the state-action space must exhibit the Markov property in order for reinforcement learning to be applicable. Empirical evidence, however, suggests that reinforcement learning also applies to doma...
متن کاملDescription and Acquirement of Macro-Actions in Reinforcement Learning
Reinforcement learning is a framing of enabling agents to learn from interaction with environments. It has focused generally on Markov decision process (MDP) domains, but a domain may be non-Markovian in the real world. In this paper, we develop a new description of macro-actions for non-Markov decision process (NMDP) domains in reinforcement learning. A macro-action is an action control struct...
متن کاملReinforcement Learning through Global Stochastic Search in N-MDPs
Reinforcement Learning (RL) in either fully or partially observable domains usually poses a requirement on the knowledge representation in order to be sound: the underlying stochastic process must be Markovian. In many applications, including those involving interactions between multiple agents (e.g., humans and robots), sources of uncertainty affect rewards and transition dynamics in such a wa...
متن کاملOn using discretized Cohen-Grossberg node dynamics for model-free actor-critic neural learning in non-Markovian domains
We describe how multi-stage non-Markovian decision problems can be solved using actor-critic reinforcement learning by assuming that a discrete version of CohenGrossberg node dynamics describes the node-activation computations of a neural network (NN). Our NN (i.e., agent) is capable of rendering the process Markovian implicitly and automatically in a totally model-free fashion without learning...
متن کاملC-trace: a New Algorithm for Reinforcement Learning of Robotic Control
There has been much recent interest in the potential of using reinforcement learning techniques for control in autonomous robotic agents. How to implement eeective reinforcement learning in a real-world robotic environment still involves many open questions. Are standard reinforcement learning algorithms like Watkins' Q-learning appropriate , or are other approaches more suit-able? Some speciic...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1998